Management Summary
This is a case study on exploratory data analysis on manually collected GPS data. With the help of various data wrangling packages (XML, tidyverse) and visualizations (ggplot2, highcharter, leaflet) we explore relationships, which, however, are not further investigated by hypothesis tests.
Introduction
In spring 2020 I joined a local sailing club on the Alster lake in Hamburg, Germany. This summer I used their boats extensively and built up my dinghy sailing skills. I recorded some of these trips with GPS and examined them with various visualization tools. These are some of my insights:
- Apparently I don’t like sailing on Wednesdays.
- Thursday is Alster exploration day.
- Conger and Kielzugvogel can get you anywhere.
- Möwe one apparently prefers to stay close to the mooring.
- Corona enhances one-handed sailing skills.
- GPS tracks of Regatta races look like balls of wool.
- The center of the Alster is (as expected) the sailing hotspot.
Data overview
Since I did not record all sessions (of course), we start with an overview of the data used in this analysis:
- Period: Sonntag, 17.05.2020 to Samstag, 17.10.2020
- Number of recorded GPS points: 87.786
- Number of recorded days: 43
- Total distance recorded: 220 nautical miles, which is approximately 396 km
- Number of used boat types: 10
- Number of different sailing partners: 15
- 43 sessions in 21.9 weeks yields 2 sessions per average week, or 1 session each 3.6 days
Calendar View
The days in the data set can be displayed without aggregation:
Observation: September was the most active sailing month (14 training sessions.
Days of the week
Now we can split the dataset according to the contained variables, beginning with the weekday.
Frequency distribution
Observation: On Wednesdays, the big alster race kangaroo regatta seems to spoil my sailing, but I seem to like to participate at the club’s own Tuesday regatta. Alternatively: After our Tuesday’s regatta, am I so tired that I rather stay home on Wednesdays?
Spatial distribution
We use the Leaflet package for interactive visualisation of all GPS tracks (use mouse wheel or soft buttons for zooming, use legend for weekday switch):
Observation: On Saturdays I like to stay close to the mooring, whereas on Thursdays I sail all the way to the university.
Headings and wind direction
For every GPS point we know the current heading (a.k.a. the “driving direction”). We can count these and visualise as histogram (frequency diagram) like a compass.
The heading around 15° (North-Northeast) and 165° (South-Southeast) seems to be rather popular with me. This is obvious considered the geographical shape of the Alster lake: With its slim North-South orientation one goes more “up and down” then “left and right”.
Observation: The usual start from the OSG mooring is towards North-Northeast. Furthermore, we often have Southwesterly and Northwesterly wind, so exactly those headings should be rare.
Boats
Frequency Distribution
With 14 sessions Conger was my favourite boat. This has two reasons: On the one hand it’s a very beginner-friendly boat (I only did my certification in autumn 2019 and have never touched a sail boat before) - the other reason you can find in the analysis of sailing partners later in this article.
Observation: Conger boats are great for learning how to sail.
Spatial distribution
Observation: Conger and Kielzugvogel can get you anywhere, using the Möwe you better stay close to the mooring.
Welches Boot war das schnellste?
Interessant ist natürlich die Frage, mit welchem Boot man im Durchschnitt welche Geschwindigkeit (angegeben in Knoten) erreicht hat.
Observation: Interestingly, both extremes in terms of weight are also the fastest: The lightweight high performance dinghy 470er and the massive keel boat C55. The J70 is the Bundesliga competition boat and should normally be faster, but on that particular day there wasn’t much wind apparently :-) Next season I should check the C55 a bit more!
Sailing Partners
Frequency Distribution
Observation: With 11 sessions I was most often alone. This was most importantly due to the COVID-19 restrictions in May and June 2020 - I had no chance but to learn one-handed sailing.
Spatial distribution of sailing partners
We use a static (i.e. not interactive) visualisation of the GPS tracks:
Observation: Racing partners Christoph, Bernd and Jochem with their tracks shaped like balls of wool.
Favourite Alster Regions
We consider the two-dimensional density function of all driven tracks. In plain english: We can divide the Alster lake into small rectangles and count, how often we hit each rectangle during our sailing sessions. Afterwards, we can colour the rectangles green to red according to their frequency - like a COVID-19 hotspot map!
Observation: The red hotspots could be the club’s Tuesday race and the mooring on the lower left is indeed visible.
Further Research
This is just a small, visualisation-driven exploratory data analysis of the 2020 season, mostly univariate and without test of the found hypotheses. Interesting questions arose:
- Can I see the correlation between boat length and boat speed in my data?
- I there any correlation between wind speed and boat types? Do I prefer certain boats for certain wind conditions?
- Despite of the 220 nautical miles on the alster lake I also completed 280 nm on the Baltic sea in 2020 - we didn’t include these here, but that would be an interesting analysis, too.
These questions could easily be examined using hypothesis tests or other machine learning methods, which would be out of scope here. The computer chips in the basement are, however, ready and I still need something to do in 2021 :-)
Credits
Thanks OSG for the great community, the crazy boats and the fun things we do together!
The Technical Stuff
Data collection was carried out using the Apps Komoot and Waterspeed, this analysis was conducted using R 4.0 and the following useful helpers:
- Data Input:
readrreadxl
- Data Wrangling:
dplyrpurrrtidyrlubridateglue
- Graphics:
highcharterggplot2randomcoloRyarrr
- Spatial Analysis:
- Output:
rmarkdownknitrprettydoc
All Code for calculations and visualisations can be downloaded in my Github Repository: https://github.com/shosaco/sailing_analyses, this page is reachable at https://shosaco.github.io/sailing_analyses.